CRIES¶

Counting Reads for Intronic and Exonic Segments¶

https://github.com/csglab/CRIES

Step 1. Creating GTF annotation files

Intron and exon annotations for gencode.v34 made within my notebook named build-genome.ipynb.

Step 2. Mapping reads

STAR

Step 3. Counting reads that map to intronic or exonic segments of each gene

featureCounts http://bioinf.wehi.edu.au/featureCounts/

Step 4. Normalization - REMBRANDTS

REMoving Bias from Rna-seq ANalysis of Differential Transcript Stability¶

REMBRANDTS is a package for analysis of RNA-seq data across multiple samples in order to obtain unbiased estimates of differential mRNA stability. It uses DESeq to obtain estimates of differential pre-mRNA and mature mRNA abundance across samples, and then estimates a gene-specific bias function that is then subtracted from Ī”exon–Δintron to provide unbiased differential mRNA stability measures.

Differential stability¶

Limma is originally design for micro-array experiments which is mainly doing same task as DESeq2. Comparing to RNA expression analysis, RNA stability may have negetive values; DESeq2 does not support negetive values but Limma does. Therefore, we decided to use Limma package instead of DESeq2 for differential analysis.

I've learned how to use limma from this DataCamp course | differential-expression-analysis-with-limma-in-r. However, I found these links useful while browsing.

InĀ [44]:
%%R 
volcanoplot(fit2,highlight = 6, names = fit2$genes[,'name'])

Box plot generator for top hits ..

InĀ [45]:
%%R 
temp = fData(eset) %>% rownames_to_column('row')
gene = temp$row[temp$name == 'ABHD2']
# Create a boxplot of a given gene
boxplot(exprs(eset)[gene, ] ~ pData(eset)[, "cond"],main = fData(eset)[gene, "name"])
boxplot(exprs(eset)[gene, ] ~ pData(eset)[, "time"],main = fData(eset)[gene, "name"])
InĀ [46]:
%%R 
temp = fData(eset) %>% rownames_to_column('row')
gene = temp$row[temp$name == 'SORD']
# Create a boxplot of a given gene
boxplot(exprs(eset)[gene, ] ~ pData(eset)[, "cond"],main = fData(eset)[gene, "name"])
boxplot(exprs(eset)[gene, ] ~ pData(eset)[, "time"],main = fData(eset)[gene, "name"])

Principal Component Analysis (PCA)¶

Just like the RNA-seq experssion analysis, sample 72h_treated_rep2 comes up as an outlier. Removing that from the analysis give us a better representation. Therfore, we can see that treated samples at 6h cluster with the non-treated samples which suggest that 6 hours treatment with the drug is not as effective as 72h and 120h. Although, we will check the variant genes in this time-point in the following statistical analysis.

InĀ [28]:
%%R 
# Plot principal components labeled by treatment
filter = !eset@phenoData@data$sample_id == '72h_treated_rep2'
plotMDS(eset[,filter], labels = pData(eset[,filter])[, "time"], col=col_by_cond[filter], gene.selection = "common")

Normalizing and filtering¶

InĀ [50]:
%%R
# Quantile normalize
exprs(eset_norm) <- normalizeBetweenArrays(exprs(eset_norm))
plotDensities(eset_norm, legend = FALSE)

Let's write normalized RNA-Stabilities into a file.

InĀ [51]:
%%R 
ncu <- exprs(eset_norm) %>% replace_na(0)
write.table(ncu,'stbl_norm_log_quantile.txt', sep="\t", quote=FALSE, col.names=TRUE)

ncu %>% summary
  6h_DMSO_rep1     6h_DMSO_rep2    6h_treated_rep1  6h_treated_rep2 
 Min.   :-9.527   Min.   :-9.527   Min.   :-9.527   Min.   :-9.527  
 1st Qu.:-2.417   1st Qu.:-2.393   1st Qu.:-2.170   1st Qu.:-2.382  
 Median :-1.231   Median :-1.137   Median : 0.000   Median :-1.098  
 Mean   :-1.372   Mean   :-1.345   Mean   :-1.119   Mean   :-1.335  
 3rd Qu.: 0.000   3rd Qu.: 0.000   3rd Qu.: 0.000   3rd Qu.: 0.000  
 Max.   : 0.000   Max.   : 0.000   Max.   : 0.000   Max.   : 0.000  
 72h_DMSO_rep1    72h_DMSO_rep2    72h_treated_rep1  72h_treated_rep2
 Min.   :-9.527   Min.   :-9.527   Min.   :-9.5268   Min.   :-9.527  
 1st Qu.:-2.230   1st Qu.:-2.301   1st Qu.:-1.9783   1st Qu.:-2.469  
 Median : 0.000   Median : 0.000   Median : 0.0000   Median :-1.389  
 Mean   :-1.175   Mean   :-1.247   Mean   :-0.9717   Mean   :-1.437  
 3rd Qu.: 0.000   3rd Qu.: 0.000   3rd Qu.: 0.0000   3rd Qu.: 0.000  
 Max.   : 0.000   Max.   : 0.000   Max.   : 0.0000   Max.   : 0.000  
 120h_DMSO_rep1   120h_DMSO_rep2   120h_treated_rep1 120h_treated_rep2
 Min.   :-9.527   Min.   :-9.527   Min.   :-9.527    Min.   :-9.527   
 1st Qu.:-2.358   1st Qu.:-2.506   1st Qu.:-2.246    1st Qu.:-2.203   
 Median :-0.948   Median :-1.483   Median : 0.000    Median : 0.000   
 Mean   :-1.307   Mean   :-1.484   Mean   :-1.190    Mean   :-1.147   
 3rd Qu.: 0.000   3rd Qu.: 0.000   3rd Qu.: 0.000    3rd Qu.: 0.000   
 Max.   : 0.000   Max.   : 0.000   Max.   : 0.000    Max.   : 0.000   
InĀ [33]:
%%R 
# Obtain the summary statistics 
stats_120h <- topTable(fit2, coef = "log2FC_120h", number = nrow(fit2),
                              sort.by = "none")

# Create histograms of the p-values for each contrast
hist(stats_120h[, "P.Value"])
InĀ [34]:
%%R 
# Obtain the summary statistics 
stats_6h <- topTable(fit2, coef = "log2FC_6h", number = nrow(fit2),
                              sort.by = "none")

# Create histograms of the p-values for each contrast
hist(stats_6h[, "P.Value"])
InĀ [Ā ]:
 
InĀ [Ā ]:
 

Enrichment testing¶

human_ensembl

human_ensembl_msigdb_c1

human_ensembl_msigdb_c2

human_ensembl_msigdb_c3

human_ensembl_msigdb_c4

human_ensembl_msigdb_c5

human_ensembl_msigdb_c6

human_ensembl_msigdb_c7

human_ensembl_msigdb_full

human_ensembl_msigdb_h

human_ensembl_RBPs_all_gene_ids

human_ensembl_RBPs_coding_gene_ids

human_ensembl_RBPs_coding_gene_ids_by_3UTR

human_ensembl_RBPs_coding_gene_ids_by_5UTR

human_ensembl_RBPs_coding_gene_ids_by_coding_exons

human_ensembl_RBPs_coding_gene_ids_by_introns

InĀ [114]:
%%R 
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS/LAPACK: /rumi/shams/abe/anaconda3/envs/limma/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    parallel  tools     stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.12      forcats_0.5.0        stringr_1.4.0       
 [4] dplyr_1.0.1          purrr_0.3.4          readr_1.3.1         
 [7] tidyr_1.1.1          tibble_3.0.3         ggplot2_3.3.2       
[10] tidyverse_1.3.0      rtracklayer_1.48.0   GenomicRanges_1.40.0
[13] GenomeInfoDb_1.24.0  IRanges_2.22.1       S4Vectors_0.26.0    
[16] Biobase_2.48.0       BiocGenerics_0.34.0  edgeR_3.30.0        
[19] limma_3.44.1        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6                locfit_1.5-9.4             
 [3] lubridate_1.7.9             lattice_0.20-41            
 [5] Rsamtools_2.4.0             Biostrings_2.56.0          
 [7] assertthat_0.2.1            R6_2.4.1                   
 [9] cellranger_1.1.0            backports_1.1.8            
[11] reprex_0.3.0                httr_1.4.2                 
[13] pillar_1.4.6                zlibbioc_1.34.0            
[15] rlang_0.4.7                 readxl_1.3.1               
[17] rstudioapi_0.11             blob_1.2.1                 
[19] Matrix_1.2-18               BiocParallel_1.22.0        
[21] RCurl_1.98-1.2              munsell_0.5.0              
[23] DelayedArray_0.14.0         broom_0.7.0                
[25] compiler_4.0.2              modelr_0.1.8               
[27] pkgconfig_2.0.3             tidyselect_1.1.0           
[29] SummarizedExperiment_1.18.1 GenomeInfoDbData_1.2.3     
[31] matrixStats_0.56.0          XML_3.99-0.3               
[33] fansi_0.4.1                 withr_2.2.0                
[35] crayon_1.3.4                dbplyr_1.4.4               
[37] GenomicAlignments_1.24.0    bitops_1.0-6               
[39] grid_4.0.2                  jsonlite_1.7.0             
[41] gtable_0.3.0                lifecycle_0.2.0            
[43] DBI_1.1.0                   magrittr_1.5               
[45] scales_1.1.1                cli_2.0.2                  
[47] stringi_1.4.6               XVector_0.28.0             
[49] fs_1.5.0                    xml2_1.3.2                 
[51] ellipsis_0.3.1              generics_0.0.2             
[53] vctrs_0.3.2                 RColorBrewer_1.1-2         
[55] glue_1.4.1                  hms_0.5.3                  
[57] colorspace_1.4-1            rvest_0.3.6                
[59] haven_2.3.1